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Abstract — It has been well established that reverse-carpooling 
based network coding can significantly improve the efficiency 
of multi-hop wireless networks. However, in a stochastic envi- 
ronment when there are no opportunities to code because of 
packets without coding pairs, should these packets wait for a 
future opportunity or should they be transmitted without coding? 
To help answer that question we formulate a stochastic dynamic 
program with the objective of minimizing the long-run average 
cost per unit time incurred due to transmissions and delays. In 
particular, we develop optimal control actions that would balance 
between costs of transmission against those of delays. In that 
process we seek to address a crucial question: what should be 
observed as the state of the system? We analytically show that just 
the queue lengths is enough if it can be modeled as a Markov 
process. Subsequently we show that a stationary policy based 
on queue lengths is optimal and describe a procedure to find 
such a policy. We further substantiate our results with simulation 
experiments for more generalized settings. 

I. Introduction 

In recent years, there has been a growing interest in the 
applications of network coding techniques in wireless com- 
munication networks. It was shown that the network coding 
can result in significant improvements in the performance of 
multi-hop wireless networks. For example, consider a wireless 
network coding scheme depicted in Figure [T]a). Here, wireless 
nodes n\ and ri2 need to exchange packets x\ and X2 through a 
relay node. A simple store -and- forward approach needs four 
transmissions. However, the network coding solution uses a 
store- code -and-forward approach in which the two packets x\ 
and X2 are combined by means of a bitwise XOR operation 
at the relay and broadcast to nodes 1 and 2 simultaneously. 
Nodes tl\ and ri2 can then decode this coded packet to obtain 
the packets they need. 

Effros et al. d introduced the strategy of reverse carpool- 
ing that allows two information flows traveling in opposite 
directions to share a path. Figure Q2b) shows an example 
of two connections, from n\ to and from 714 to n\ that 
share a common path (m, ri2, ^3, ^4). The wireless network 
coding approach results in a significant (up to 50%) reduction 
in the number of transmissions for two connections that use 
reverse carpooling. In particular, once the first connection is 
established, the second connection (of the same rate) can be 
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Fig. 1. (a) Wireless Network Coding (b) Reverse carpooling (c) 3-Node 
Relay Network. 



established in the opposite direction with little additional cost. 

In this paper, we focus on design and analysis of scheduling 
protocols that exploit the fundamental trade-off between the 
number of transmissions and delay in the reverse carpooling 
schemes. In particular, to cater to delay- sensitive applications, 
the network must be aware that savings achieved by coding 
may be offset by delays incurred in waiting for such opportu- 
nities. 

Consider a relay node that transmits packets between two 
of its adjacent nodes that has flows in opposite directions, as 
depicted in Figure [He). The relay maintains two queues q\ and 
q2, such that q\ and q2 store packets that need to be delivered 
to node 2 and 1, respectively. If both queues are not empty, 
then it can relay two packets from these queues by performing 
an XOR operation. However, what should the relay do if one 
of the queues have packets to transmit, while the second queue 
is empty? Should the relay wait for a coding opportunity or 
just transmit a packet from a non-empty queue without coding? 
This is the fundamental question we seek to answer. In essence 



we would like to trade off efficiently transmitting the packets 
against high quality of service (i.e. low delays). 

A. Related Work 

Network coding research was initiated by seminal work by 
Ahlswede et al. and since then attracted major interest 
from the research community. Many initial works on the 
network coding technique focused on establishing multicast 
connections between a fixed source and a set of terminal nodes. 

Network coding technique for wireless networks has been 
considered by Katabi et al. [3]. They propose an architecture 
called COPE, which contains a special network coding layer 
between the IP and MAC layers. In our earlier work (4], we 
showed how to design coding-aware routing controllers that 
would maximize coding opportunities (and hence reduce the 
number of transmissions) in multihop networks. However, in 
contrast to all the above literature our objective here is to study 
the delicate tradeoff between transmission costs and waiting 
costs when network coding is an option. 

B. Main Results 

Our objective is to develop policies that yield a transmit/do 
not transmit decision at each time instant. We define our ob- 
jective as the long-run average transmission cost plus waiting 
time cost on a per-unit-time basis. Thus, the relay can transmit 
and incur a transmission cost, or wait in the hope that a codable 
packet will arrive, which allow the transmission cost to be 
shared between two packets. 

We first consider the case of a single relay, and assume 
that arrivals into both queues follow independent Bernoulli 
processes. We find that the optimal policy is a stationary 
queue-length threshold policy with one threshold for each 
queue at the relay, and whose action is simply: if a coding 
opportunity exists, code and transmit; else transmit a packet 
if the threshold for that queue is reached. We show how to 
find the optimal thresholds, and find exact expressions for 
the expected cost based on the stationary distribution of the 
Markov Chain when controlled by this policy. 

II. System Overview 

Consider a multi-hop wireless network operating a time- 
division multiplexing scheme to store and forward packets 
from various sources to destinations. Time is slotted into small 
intervals and in each interval every node gets to transmit at 
most one packet of a flow. This packet is transmitted during a 
"mini-slot" that the node has been assigned. We assume that 
this mini-slot is instantaneous for all practical purposes. Also, 
in this paper we will not consider any scheduling issues and 
assume that we have scheduled mini-slots assigned to each 
node for each flow where nodes have opportunities to transmit 
if they choose to. With that said, we will now describe the 
scenario from the perspective of a single node, especially a 
relay that has the potential for network coding packets from 
flows in opposing directions. We will revert to the entire 
network only in Section [VI 



A. Scenario from a Relay 's Perspective 

Consider Figure [He). We call two of the adjacent nodes 
to the relay R as nodes 1 and 2. Say there is a flow j\ that 
goes from node 1 to 2 and another flow f2 from node 2 to 1, 
both of which are through the relay under consideration. The 
packets (type 1 and type 2) from both flows respectively go 
through separate queues, q\ and q2, at node R. With respect to 
the relay we now define a slot as the time between successive 
opportunities for the relay to transmit. In each slot a packet 
arrives from node i (during its transmission opportunity) to qi 
with probability pi for i = 1,2. Also, with probability (1 — pi) 
no packet arrives from node i in a slot. Thus, a maximum of 
1 packet arrives from each adjacent node to the relay during a 
slot (this is according to the network definition and scheduling 
we described earlier). At the end of a slot, the relay gets an 
opportunity to transmit. 

Notice that at the end of a slot, the relay can transmit a 
maximum of one packet. When both queues are non-empty, 
one packet from q\ and one from q2 can be transmitted 
together as a single packet using XOR coding. This scenario, 
in which transmitting a combination of packets results in 
decreasing the required number of transmissions, is referred 
as a coding opportunity. Whenever such a coding opportunity 
exists between the packets of two flows, the relay encodes the 
packets and transmits the coded packet back to the adjacent 
nodes. However, if there is only one type of packet at the 
end of a slot, there are two options: (a) one of those packets 
gets transmitted without coding or (b) we wait for a future 
slot to receive a matching packet in the other queue to utilize 
the coding opportunity. We assume that transmissions within 
a type is according to a first-in-first-out basis. 

Note that the relay node faces one of three kinds of 
situations: (i) one packet of one type and at least one packet of 
another type; (ii) only one type of packet(s); (iii) no packets. 
The decision in situations (i) and (iii) is straightforward, one 
would code using XOR in situation (i) and transmit, whereas 
do nothing in situation (iii). However in situation (ii), it is 
unclear as to what is the best course of action, do nothing 
(thus worsening delay) or transmit without coding (thus being 
inefficient). In other words, to wait or not to wait, that is the 
question. 

B. Markov Decision Process Model 

To develop a strategy for the relay to decide at every 
transmission opportunity, its best course of action, we use 
a Makov decision process (MDP) model. For i = 1,2 and 
n = 0, 1, 2, . . ., let Y£ be the number of packets in queue i at 
the end of time slot n just before an opportunity to transmit. 
Let A n be the action chosen at the end of the n th time slot 
with A n = implying the action is to do nothing and A n = 1 
implying the action is to transmit. As we described before, if 
Y* + Yn = 0, then A n = because that is the only feasible 
action. Also, if Y^Y% > 0, then A n = 1 because the best 
option is to transmit a coded XOR packet as it both reduces 
the number of transmissions as well as latency. However, when 
exactly one of Y^ and Y% is non-zero, it is unclear what the 
best course of action is. 



To develop a strategy for that, we first define costs for 
latency and transmission. Let C t be the cost for transmitting a 
packet and Ch be the cost for holding a packet for a length of 
time equal to one slot. Without loss of generality, we assume 
that if a packet was transmitted in the same slot it arrived, 
its latency is zero. Also, the cost of transmitting a coded 
packet is the same as that of a non-coded packet. That said, 
our objective is to derive an optimal policy that minimizes 
the long-run average cost per slot. For that we define the 
MDP {(Y n ,A n ),n > 0} where Y n = (F n \F n 2 ) is the state 
of the system and A n the control action chosen at time n. 
The state space (i.e. all possible values of Y n ) is the set 
{(i,j):^>0,j<lor j > 0,i < 1}. 

Let C(Y n , A n ) be the cost incurred at time n if action A n 
is taken when the system is in state Y n . Therefore, 

C(Y n , A n ) = C h ([Y^ - A n ]+ + [F n 2 - An]+) + C t A n (1) 

where + = max(x, 0). The long-run average cost for some 
policy u is given by 



V(u) 



lim — - 

N^oo N + 1 



N 



^C(r n ,A n )|F = (0,0) 



,n=0 



(2) 



(3) 



where E u is the expectation operator taken for the system 
under policy u. Notice that our initial state is an empty system, 
although the average cost would not depend on it. Our goal 
is to obtain the optimal policy u* that minimizes V(u). For 
that we first describe the probability law for our MDP and 
then in subsequent section develop a methodology to obtain 
the optimal policy u*. 

For the MDP {(Y n , A n ),n > 0}, the probability law can 
be derived for i > and j > as following in terms of 
P a (Y n: Y n+ i ), the transition probability from state Y n to F n +i 
associated with action a G {0, 1}. 

Pi((i,j),([i-l] + ,[j-l] + ))=Pi 
P 1 ((i,j),(max(i,l),[j-1]+)) = p 2 
P 1 UiJ) : ([i-l] + : m^(j,l))) =p 3 
Pi((hj)i (max(i, 1), max(j, 1))) = p 4 
Po((hj),(hj)) =Pi 
Po((i,j),(i + l,j)) =P2 

Po((i,j),(i + l,j + l)) = Pa 

where pi = (l-p 1 )(l-p 2 ), =Pi(l-P2),P3 = (l-pi)p2, 
and p4 = P1P2 . Also note the caveats that: i and j cannot both 
be greater than 1; if i = j = 0, then A n = 0; if i > and 
j > 0, then A n = 1. 

III. Analysis 

As we described in the previous section, our goal is to obtain 
the optimal policy u* that minimizes g(u), defined in ©. To 
that end, we first find the space of possible policies and then 
identify the optimal policy within this space. Our first question 
is: what is the appropriate state space: is it just queue length, 
or should we also consider waiting time? 



A. Should we maintain waiting time information ? 

Intuition tells us that if a packet has not been waiting long 
enough then perhaps it could afford waiting a little more but if 
a packet has waited too long, it may be better to just transmit 
it. That seems logical considering that we tried our best to 
code but we cannot wait too long because it hurts in terms of 
holding costs. Also, one could get waiting time information 
from time-stamps on packets that are always available. Given 
that, would we be making better decisions by also keeping 
track of waiting times of each packet? We answer that question 
by means of a theorem which requires the following lemma 
for a generic MDP {(X n , D n ), n > 0} where X n is the state 
of the MDP and D n is the action at time n. 

Lemma 1: (Puterman \5\) For an MDP {(X n , D n ), n > 0}, 
given any history dependent policy and starting state, there 
exists a randomized Markov policy with the same long-run 
average cost. 

Using the above lemma we show next that it is not necessary 
to maintain waiting time information. 

Theorem 2: For the MDP {(Y n , A n ),n > 0}, if there exists 
a randomized history dependent policy that is optimal, then 
there exists a randomized Markov policy u* that minimizes 
V(u) defined in (|2]). Further, one cannot find a policy which 
also uses waiting time information that would yield a better 
solution than V(u*). 

Proof: From Lemma [Q if the MDP {(Y n , A n ),n > 0} 
has a history dependent policy that is optimal, then we 
can construct a randomized Markov policy that yields the 
same long-run average cost given Yq = (0,0). Therefore, 
if there exists a randomized history dependent policy that is 
optimal, then there exists a randomized Markov policy u* that 
minimizes V(u) defined in (|2]). 

Knowing the entire history of states and actions one can 
always determine the history of waiting times as well as the 
current waiting times of all packets. Therefore the optimal 
policy u f that uses waiting time information is equivalent to 
a history dependent policy. From Lemma [T] we can always 
find a randomized Markov policy that yields the same optimal 
solution as V(u'). ■ 

B. Structure of the optimal policy 

In the previous sections, we showed that there exists an 
optimal policy that does not include the waiting time in the 
state of the system. In this section we focus on queuelength- 
based policies and determine the structure of the optimal 
policy. In the MDP literature (see Sennott (SI), the conditions 
for the structure and location of optimal policy usually rely on 
the results of the infinite horizon /3-discounted cost case and 
let f3 approach 1 to obtain the average cost case. Accordingly, 
for our MDP {(Y n , A n ),n > 0}, the total expected discounted 
cost incurred by a policy 6 is 



J2f3 n C(Y n ,A n )\Y = (iJ) 



n=0 



(4) 



In addition, we define Vp(i,j) = mine V$,p{i, j) as well as 
vp(i,j) = V p (i,j)-Vp(0,0). 



Proposition 3: Vp(i,j) is finite for all i, j, and discount 
factor f3. 

Proof: We denote the stationary policy of always trans- 
mitting by 6. Hence, Vp(i,j) < Vq Furthermore, we 
note that V § p(i,j) < C(iJ)/(l - /?), where C(iJ) 4 
C h ([i - 1]+ + [j - 1]+) + C t . Thus, Vp(i,j) < oo and the 
proposition follows. ■ 
Proposition [3] implies that Vp(i,j) satisfies the optimality 
equation 151 . 

Vp(i,j)= min [C h ([i-a]+ + \j - a) + ) + C t a 

aG{0,l} 

+pJ2Wk,t)Pa((i,j),(k,l))}. (5) 

k,£ 

The next lemma specifies the conditions that must be 
satisfied by the optimal stationary policy. 

Lemma 4: (Sennott |6]) There exists a stationary policy that 
is optimal for the MDP {(Y n , A n ),n > 0} if the following 
conditions are satisfied: (i) Vp(i,j) is finite for all i, j, and 
discount factor /?; (ii) there exists a nonnegative TV such that 
vp(i,j) > —N for all i, j, and /?; and (iii) there exists a 
nonnegative Mij such that vp(i,j) < Mij and 



k,l 



OO 



(6) 



for every i, j, f3, and action a. 

Using Lemma IH we show next that the MDP defined in this 
paper has an optimal policy that is stationary. 

Theorem 5: For the MDP {(Y n , A n ),n > 0}, there exists 
a stationary policy u* that minimizes V(u) defined in 0. 

Proof: As described earlier it is sufficient to show that the 
three conditions in Lemma @] are met. Proposition [3] results in 
the condition (i), while the condition (ii) on finding an N is 
also straightforward since all the costs are positive. 

Considering the policy of always transmitting 0, the upper 
bound on Vp(i, j) can be obtained from © as C^V#(1, 0) + 



C- 2 jVp(0,l) + C%Vp(0,0) for some finite-valued CfJ. It 
therefore remains to show that there exist nonnegative finite- 
valued Mi 5 o and Mo,i such that 1^(1,0) < and 
^(0,1) < M ,i. Note that we take A n = 1 for the state 
(1,1), and therefore 

Vp(l, 1) = 6 + P{piVp(0, 0) +p2Vp(l, 0) +P3^(0, 1)), (7) 

where C = C t /(l - (3p 4 ) and $ = (3/(1 - (3p 4 ). Hence, the 
upper bound on Vp(0,l) and Vp(l,0) can be obtained from 
© as follows. 



Based on ([5]) and ©, it follows that the condition (iii) is 
satisfied because there exist some nonnegative finite-valued 

M ,i and M li0 so that Vp(l,0) < Vp(0,0) + M h0 and 
^(0,1) <^(0,0) + M ,i. 

■ 

Now that we know that the optimal policy is stationary, 
the question is how do we find it. The standard methodology 
to obtain stationary policy for infinite-horizon average cost 
minimization problem is to use a linear program as described 
below. 

Consider a generic MDP {(X n ,Z} n ),n > 0} where X n is 
the state and D n is the action at time n. Assume that the MDP 
has a finite number of states in the state space and the number 
of possible actions is also finite. Assume that the Markov 
chain resulting out of any policy is irreducible. Let u be a 
stationary randomized policy described for state X n = i and 
action D n = a as follows: 

u ia = P{D n = a\X n = i} 

for all i in the state space and all a in the action space. 
Note that Ui a is the probability of choosing action a when the 
system is in state i. Further, define the expected cost incurred 
when the system is in state i and the action is a as 

da = E[C(X n ,D n )\X n =i,D n = a] 

where C(X n , D n ) is the cost incurred at time n if action D n 
is taken when the system is in state X n . 

Lemma 6: (Serin and Kulkarni 0) The optimal random- 
ized policy u* a that minimizes the long-run average cost per 
unit time (equal to the length of a slot) can be computed as 



^(1,0) < 



Ct + mc | ^(1+^4)^(0,0) 

i - + 1 - + 

, ^3(1 + ^4)^(0,1) 



Vp(0,l) < 



C t + pp A C 
l-Pfa{\ + j3p±) ' 1 

1-^3(1 + ^4) 



(8) 



^1(1 + ^4)^9(0,0) 

/3p3(l + /5p4) 



(9) 



where x* = [x* a ] is the optimal solution to the linear program: 
Minimize ^ ^ 

i a 

subject to ^ ^ Xi a = 1 

i a 

^2 X ja ~ ^2^2Pij(a)%ia = Vj 

a i a 

x ia > Vi,a. 

As described in Ross O, the linear program (LP) produces 
for each i optimal values x* a that are all zero except one a 
which would be 1 . Hence the optimal policy would in fact be 
a stationary deterministic policy. 

However, we cannot directly apply the above results to our 
MDP {(Y n , A n ),n > 0}, as our MDP has infinite states and 
the Markov chain under every policy is not irreducible (for 
example if we always transmit, it is not possible to reach some 
of the states). To circumvent that, we construct a finite size 
LP with TV states and force it to be irreducible by creating 
dummy transitions with probability e > between some states. 
Let us call this LP(N, e). From the lemma above, LP(N, e) 
has a stationary deterministic policy that is optimal. By letting 
N — )■ 00 and e — >• we argue that our MDP would have an 



optimal deterministic policy. With that said, it is not efficient 
to obtain the optimal policy by solving LP(N, e) for large TV 
and small e. 

We now know that the optimal policy is stationary deter- 
ministic. But, how do we find it? If we know that the optimal 
policy satisfies some structural properties then it is possible to 
search through the space of stationary deterministic policies 
and obtain the optimal one. 

Theorem 7: For the MDP {(Y n , A n ),n > 0}, the optimal 
policy is of threshold type. There exist the optimal thresholds 
L\ and L 2 so that the optimal deterministic action in states 
(i,0) is to wait if i < L\, and to transmit without coding 
if i > L\\ while in state (0, j) is to wait if j < L 2 , and to 
transmit without coding if j > L\. 

Proof: Using the same notations as ©, define Vp(i, 0) = 
min aG { ,i} V£ >a (i), where V£ >a (i) = C h ([i - a]+) + C t a + 
^ kt Mk,i)P a ((i,j),(k,l)). Let z* = min{* G Z+ : 
Vp,o (i) > Vp 3 i(i)}. Then the optimal stationary and deter- 
ministic action (for the total expected /3-discounted cost) is 
A n = for the states (i,0) with i < i*, and A n = 1 for 
the state (i*,0). Note that we do not care about the states 
(i, 0) with i > i* since they are not accessible as (i*, 0) only 
transits to (i*-l,0), (i*,0), (z*-l, 1), and (i*,l).The similar 
argument is applicable for the states (0, j). 

Thus far we have shown that the threshold-based policy 
is optimal for the total expected /3-discounted cost. Accord- 
ing to 0, it remains to show that the V(u) in © exists 
while u is applied by a threshold-based policy. Since the 
steady state probability distribution of the Markov chain 
(induced from the MDP {(F n ,A n ),n > 0} associated with 
the threshold-based policy) can be computed as in Theorem 
E lim^ooKEto C ft,4)l^o = (0,0)]/(JV+l) will 
converge |[T0l . The threshold-based policy is therefore also 
optimal for the long-run average cost. ■ 



C. Analysis: obtaining the optimal deterministic stationary 
policy 

We have shown in the previous section that the optimal 
policy is stationary, deterministic and threshold type. The next 
step is to find it. Notice that we only need to consider the 
subset of deterministic stationary policies. 

Theorem 8: The optimal thresholds LI and L 2 are 



{L\,LD = arg min C t r(L lj L 2 ) + C h X(L u L 2 ) 



(10) 



where 



r(Li,L 2 ) 



A(Li,L 2 ) = 



Li Z/2 

PlP27T ,0 + P2 ^2 + Pi 7F °'^ + 
i=l j=l 

-^2)^,0+^2(1 -Pi)tto,l 2 (11) 

Li L 2 

^/7Ti,0 + ^ jTTQJ (12) 
i=l j=l 



for which 

7T0,0 



a = 



7TO,o/« J 
(1 ~P2)P1 
(1 ~Vl)P2 



(13) 

(14) 
(15) 

(16) 



Proof: Let Li and L 2 be the thresholds. Our objective 
is to find their corresponding optimal values. Let Xi(t) be 
the number of type i packets at the beginning of the t th slot 
before any arrival or transmission. It is crucial to note that this 
observation time is different from when the MDP is observed. 
Then the bivariate stochastic process {(Xi(t), X 2 (t)),t > 0} 
is a discrete-time Markov chain. The states are (0,0), (1,0), 
(2, 0), . . ., (Za,0), (0, 1), (0, 2), . . ., (0, L 2 ). Define a as a 
parameter such that 

(1 ~P2)Pl 
(1 ~P1)P2 

Let TTij be the steady-state probabilities of the Markov chain. 
The balance equations for < i < L\ and < j < L 2 are: 



7Ti,0 = 

Since 7r ,o + J2i j ^,0 + ^oj 



n 0,j-l 

1, we have 
1 



7T0,0 



(^++) + (+++) - 1 

7TO,o/« J 



(17) 
(18) 



(19) 

(20) 
(21) 



for all < i < L 1 and < j < L 2 . 

The expected number of transmissions per slot (we count 
an individual and a paired transmission using network coding 
both as one transmission) is 



r(L u L 2 ) 



PlP27TQ,0 



P2 y^7Ti,0 
i=l 



Pi 



L 2 

E 



Pl(l -P2)7TL 15 +P2(1 -Pl)7T0,L 2 (22) 



The average number of packets in the system at the beginning 
of each slot is 



A(Li,L 2 ) 



Li 

E 

i=l 



L 2 

E 



Thus the long-run average cost per slot is 

C t T(L 1 ,L 2 ) + C h \(L 1 ,L 2 ) 



(23) 



(24) 



which upon minimizing we get the optimal thresholds L\ and 



T * 

L 2 



Whenever Ch > 0, it is relatively straightforward to obtain 
LI and Z^. Since it costs C t to transmit a packet and for 
a packet to wait for a slot, it would be better to transmit a 



Waiting-Transmitting Trade-off (P 1 ,P 2 )=(0.5,0.5) Line Network (P 1 ,P 2 )=(0.5,0.5) 




(a) (b) (c) 



Fig. 2. (a) Trade-off between average delay and number of transmissions in a single relay using queue-length threshold policy for different Bernoulli arrival 
rates (pi,p2)» (b) Comparison of the minimum average cost (per packet) in a single relay with Bernoulli arrival rates (0.5,0.5), for different policies, (c) 
Comparison of different policies in a line network with two intermediate nodes and two Bernoulli flows with mean arrival rates (0.5, 0.5) 



packet than make a packet wait for more than C t /Ch slots. 
Thus LI and L\ would always be less than Ct/Ch- Hence 
by completely enumerating between and Ct/Ch for both L\ 
and 1/2, we can obtain L\ and L\. One could perhaps find 
faster techniques than complete enumeration, but it certainly 
serves the purpose. 

IV. Numerical Experiments and Results 

In this section we present several numerical results to 
demonstrate the analytical formulation as well as its exten- 
sions. We study the performance of a number of policies: 

1) Opportunistic Coding: when a packet arrives, coding is 
performed if a compatible packet is available, otherwise 
transmission takes place immediately. 

2) Queue-length-based threshold: a Stationary Determinis- 
tic (SD) policy that our analysis suggests it should be 
optimal for the Bernoulli case. 

3) Randomized-Queuelength-based threshold: a Stationary 
policy that Randomizes (SR) over deterministic policies. 
We expect that it would not perform any better than 
deterministic queue-length-based policies. 

4) Queue-length-plus- Waiting-time-based thresholds: a 
History Dependent policy (HR) which is likely to give 
the best possible performance. 

5) Waiting-time-based thresholds: an HR policy that we 
create for the purpose of comparison to illustrate that 
history on its own is only of limited value. 

We simulate these policies on two different cases: (i) the single 
relay with Bernoulli arrivals (Figures [2(a) | and [2(b)) and (ii) a 
line network with 4 nodes, in which the sources are Bernoulli 
(Figure [2(c)| ). Note that in this case, since the departures from 
one queue determine the arrivals into the other queue, the 
arrival processes are significantly different from Bernoulli. Our 
simulations is done in Java and for each scenario we report 
the average results of 10 5 iterations. 

Our numerical studies illustrate that, as expected, a deter- 
ministic queue-length based policy is optimal for different 
network scenarios. The results are intriguing as they suggest 
that achieving a near-perfect tradeoff between waiting and 
transmission costs is possible using simple policies; and cou- 
pled with optimal network-coding aware routing policies like 



the one in our earlier work (4], have the potential to exploit 
the positive externalities that network coding offers. 

V. Conclusion 

In this paper we developed algorithms that explore the del- 
icate tradeoff between waiting and transmitting using network 
coding. We started with the idea of exploring the whole space 
of history dependent policies, but showed step-by- step how 
we could move to simpler regimes, finally culminating in a 
stationary deterministic queue-length threshold based policy. 
The policy is attractive because its simplicity enables us to 
characterize the thresholds completely, and we can easily 
illustrate its performance on multiple networks. We showed 
by simulation how the performance of the policy is optimal 
in the Bernoulli arrival scenario, and how it also does well in 
other situations such as for line networks. Our results also have 
some bearing on the general problem of queuing networks with 
shared resources that we will explore in the future. 
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